This notebook was put together by [Jake Vanderplas](http://www.vanderplas.com) for UW's [Astro 599](http://www.astro.washington.edu/users/vanderplas/Astr599/) course. Source and license info is on [GitHub](https://github.com/jakevdp/2013_fall_ASTR599/).
Today's session is going to be a grab-bag of information that will help you write clean, usable, and maintainable code. Working with Python in science is much more than simply the mechanics of writing code: it's about writing code in a way that allows it to be used, re-used, adapted, and understood by collaborators and your future self.
We'll be talking about a few key components of this: documentation, testing, and packaging. Again, this will be a quick summary, but I'll try to give some useful examples and resources to learn more.
In [1]:
import numpy as np
np.linalg.svd?
You can print out the documentation string using the help()
function:
In [2]:
help(np.linalg.svd)
The idea with documentation is to give a broad description of what the algorithm does, as well as a detailed description of arguments and return values.
The way you do this with your own functions is by using a multi-line string literal at the beginning of the function. For example, let's write a documentation string for the fibonacci function we looked at last week.
First, check what happens if we don't document the function:
In [3]:
def fib(N):
x = np.zeros(N, dtype=float)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
In [4]:
fib?
In [5]:
help(fib)
Not too helpful. Now let's re-define the function with a doc string:
In [6]:
def fib(N):
"""
Compute the first N Fibonacci numbers
Parameters
----------
N : integer
The number of Fibonacci numbers to compute
Returns
-------
x : np.ndarray
the length-N array containing the first N
Fibonacci numbers.
Notes
-----
This is a pure Python implementation. For large N,
consider a Cython implementation
Examples
--------
>>> fib(5)
array([ 0., 1., 1., 2., 3.])
"""
x = np.zeros(N, dtype=float)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
Now let's take a look at the help string for the function:
In [7]:
fib?
In [8]:
help(fib)
You should get in the habit of always documenting your functions and classes in this way. It will make sharing your code (and using your code yourself!) much, much easier in the long run.
There is no requirement for how your code should be documented, but a standard in the scientific Python community is to follow the NumPy doc-string standard. This outlines many potential sections, but the basic example should look something like this:
In [9]:
def example_function():
"""Short description of the function.
Longer multi-line description of the function,
with a bit more information on the function's intent.
Parameters
----------
varable_name : <type>
short description (4-space indent)
another_variable : <type>
short description, potentially with default value
Returns
-------
return_value : <type>
description
another_return_value: <type>
more description
Notes
-----
Extended discussion of the details of the algorithm
References
----------
Any website or paper with information about the algorithm
or implementation
Examples
--------
Clarifying examples of how to use the function. Use the
Python interpreter syntax (starting with >>>) and show the
output:
>>> print "hello world"
hello world
"""
return None
This emphasis on formatting may seem a bit extreme, but it's nice for two reasons:
We saw the documentation for np.linalg.svd
above. By running this through Sphinx, the following documentation is generated. This can be very handy if you write code that you end up sharing with others.
In [10]:
from IPython.display import IFrame
IFrame(('http://docs.scipy.org/doc/numpy/'
'reference/generated/numpy.linalg.svd.html'),
width=600, height=400)
Out[10]:
In [11]:
class Multiplier(object):
"""A simple class implementing multiplication
Parameters
----------
x : float
an integer value
"""
def __init__(self, x):
self.x = x
def get_multiple(self, n):
"""Return n * x
Parameters
----------
n : float
multiplier
Returns
-------
y : float
y = n * x
"""
return n * self.x
Now we can look at the documentation. Notice that a list of methods is automatically included.
In [12]:
Multiplier?
In [13]:
help(Multiplier)
In [14]:
Multiplier.get_multiple?
In [15]:
help(Multiplier.get_multiple)
In [16]:
%%file mymodule.py
"""
Simple module to show module-level documentation.
You can write some extra information here.
It can be as long as you'd like...
Functions:
- product : multiply two numbers
- quotient : divide two numbers
"""
def product(a, b):
"""return the product of a and b"""
return a * b
def quotient(a, b):
"""retur the quotient of a and b"""
return a / b
In [17]:
import mymodule
mymodule?
In [18]:
help(mymodule)
One other important piece of good coding practice is to write tests, particularly Unit Tests. Unit tests are tests of the individual pieces of a computational process, and can be very helpful for both giving yourself peace-of-mind that your code is behaving properly, and making sure your code still works as expected if and when you start optimizing the slower pieces.
Let's write a small fib
module which contains test code. We'll copy and paste the function definition from above, and then add a test function
In [19]:
%%file fibmodule.py
"""
Functions to compute Fibonacci sequences
"""
import numpy as np
from numpy.testing import assert_allclose
def fib(N):
"""
Compute the first N Fibonacci numbers
Parameters
----------
N : integer
The number of Fibonacci numbers to compute
Returns
-------
x : np.ndarray
the length-N array containing the first N
Fibonacci numbers.
Notes
-----
This is a pure Python implementation. For large N,
consider a Cython implementation
Examples
--------
>>> fib(5)
array([ 0., 1., 1., 2., 3.])
"""
x = np.zeros(N, dtype=float)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
def test_first_ten():
nums = fib(10)
assert_allclose(fib(10),
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
Now you can import and run the test to make sure it passes:
In [20]:
import fibmodule
fibmodule.test_first_ten()
This approach works, but if you have many many tests (which you generally will for larger projects), running them by-hand can be tedious.
Fortunately, there are testing frameworks which streamline this process. Most of the Scientific Python stack uses the Nose testing framework to do this.
First, make sure you have nose installed by running
[~]$ conda install nose
(or, if you're not using Anaconda, pip install nose
)
Then from the command-line, run the following (-v
stands for "verbose"):
In [21]:
!nosetests -v fibmodule
Nosetests goes through the module and looks for any function which starts with test_
, and runs them, checking for errors.
There are many options for writing and organizing your tests. For more information, look at the Nose documentation or browse the source code of a package like Scikit-learn, which utilizes an extensive nose testing framework (see especially the sklearn.tests submodule).
One thing nose does which is very useful is doctests. The Doc-testing framework goes through the documentation strings of the functions, and looks for code snippets which start with >>>
, and then runs them to make sure the results are correct.
You can run the doctests on our fibmodule
using the following command:
In [22]:
!nosetests -v --with-doctest fibmodule
We see that it ran two tests, one of which is the doc test within the fib()
function.
Just to make sure it's doing what it claims, let's write a quick module with a failing test and see what Nose tells us:
In [23]:
%%file fail.py
"""A module with a failing test"""
def test_fail_at_math():
"""Here's a test that fails"""
assert 1 + 1 == 3
def test_succeed_at_math():
"""Here's a test that passes"""
assert 1 + 1 == 2
In [24]:
!nosetests -v fail
As advertised, it finds the failing test! Notice especially that the output shows the first line of the documentation string of each test function. This can be really helpful when diagnosing test failures.
Eventually, you will write code which is so awesome that you'll want to share it with other people. In order to do this, you'll want to create a well-tested, well-documented module, potentially with a setup.py
script to aid in installing the module.
There are several good ways to approach this, but generally good practice is to do a directory structure which looks something like this:
RootDirectory: this is your git repository,
| if you're using git
| mymodule/
| | __init__.py : this is the file that tells Python
| | that your module is a module. It
| | contains code which is executed
| | when the module is imported
| |
| | file1.py : Any number of .py files containing
| | file2.py : your awesome, well-documented code
| | ...
| |
| | tests/
| | | __init__.py : lets Python know this is a
| | | submodule.
| | |
| | | test_file1.py : Any number of .py files which
| | | test_file2.py : contain test scripts for your
| | | ... functions. Organize them in
| | | a way that helps you remember
| | | what they are.
|
| README.md : general information about the module
| LICENSE : a license for code re-use.
| Consider the BSD license
| INSTALL : an installation file
| setup.py : the setup script (see below)
Notice that every module and submodule has an __init__.py
file, which may or may not contain any code. The __init__.py
file is executed once when the module or submodule is loaded.
The setup.py
file contains a script that allows you to install your module system-wide using the command
[~]$ python setup.py install
There are way too many details to go into here regarding setup scripts and Python's distutils: for a good tutorial, take a look at Python's distutils documentation. You may also want to look at other packages to see how they use this file (for example, check out my astroML setup file).
Your homework is to create a module containing a Cosmology()
class, which computes some quantities using expressions from the Hogg 1999 Distance Measures paper. We took a brief look at this problem in the boot camp at the beginning of the quarter (Lecture 8).
This will be a simple example of test-driven development, a development style in which the desired functionality is first written as a series of tests, and then code is written which passes those tests.
You'll be creating a cosmology
package, adding some simple cosmological distance measures, and testing and documenting the code. In your homework git repository, create the following directory structure:
HW8/
| cosmology/
| | __init__.py
| | cosmology.py
| | tests/
| | | __init__.py
| | | test_cosmology.py
The files should contain the following:
HW8/cosmology/__init__.py
:"""Cosmology tools
<add more description here>
"""
# We'll import the Cosmology class using
# a "relative import":
from .cosmology import Cosmology
HW8/cosmology/tests/__init__.py
"""Cosmology Tests"""
# Nothing else here
HW8/cosmology/Cosmology.py
(see below -- you'll be modifying this file)
HW8/cosmology/tests/test_cosmology.py
(see below)
Your task is to fill-in the class and method definitions in cosmology.py
using the formulae from the Hogg paper. Once this is all set up, you should be able to run IPython from within the HW8
directory and type, e.g.
In [1]: from cosmology import Cosmology
In [2]: cosmo = Cosmology()
In [3]: cosmo.DC(0)
Out[3]: 0.0
From the top directory (HW8
) run nosetests -v cosmology
to test your implementation, and make sure all the tests pass. Be sure to add suitable documentation strings to the class as well, including a doctests or two for each method (make sure that these also pass, using nosetests --with-doctest
).
In [ ]:
%%file cosmology.py
import numpy as np
from scipy import integrate
# speed of light
C = 299792.458 # km/s
class Cosmology(object):
"""Cosmology class implementing Cosmological Distance Functions
<write an appropriate doc-string: be sure to reference Hogg paper>
"""
def __init__(self, OmegaM=0.3, h=0.7):
# for now, we'll implement only
# a flat cosmology for simplicity
self.OmegaK = 0
self.OmegaM = OmegaM
self.OmegaL = 1. - OmegaM
# Hubble constant, km/s/Mpc
self.H0 = h * 100
# Hubble Distance, Mpc
self.DH = C / self.H0
def _Einv(self, z):
# implement the inverse of Eqn 14. This is the function that will be
# integrated in order to compute other quantities
pass
def DC(self, z):
"""Comoving Distance (Mpc)
<fill in doc string>
"""
# Compute the comoving distance in Mpc using scipy.integrate.quad
# following Eqn 15
def DM(self, z):
"""Transverse Comoving Distance (Mpc)
<fill in doc string>
"""
# Compute the transverse comoving distance in Mpc (Eqn 16)
def DA(self, z):
"""Angular Diameter Distance (Mpc)
<fill in doc string>
"""
# Compute the Angular Diameter distance in Mpc (Eqn 18)
def DL(self, z):
"""Luminosity Distance (Mpc)
<fill in doc string>
"""
# Compute the Luminosity Distance in Mpc (Eqn 21)
def mu(self, z):
"""Distance Modulus (magnitudes)
<fill in doc string>
"""
# Compute the distance modulus (Eqn 25)
In [ ]:
%%file test_cosmology.py
"""Tests of the Cosmology class"""
from numpy.testing import assert_allclose
from .. import Cosmology
# [OmegaM, h, z, DC]
DC_RESULTS =\
[[0.2, 0.6, 0, 0.0],
[0.2, 0.6, 0.5, 2285.61],
[0.2, 0.6, 1.0, 4114.6],
[0.2, 0.7, 0, 0.0],
[0.2, 0.7, 0.5, 1959.09],
[0.2, 0.7, 1.0, 3526.8],
[0.3, 0.6, 0, 0.0],
[0.3, 0.6, 0.5, 2203.4],
[0.3, 0.6, 1.0, 3854.47],
[0.3, 0.7, 0, 0.0],
[0.3, 0.7, 0.5, 1888.63],
[0.3, 0.7, 1.0, 3303.83],
[0.4, 0.6, 0, 0.0],
[0.4, 0.6, 0.5, 2131.92],
[0.4, 0.6, 1.0, 3649.21],
[0.4, 0.7, 0, 0.0],
[0.4, 0.7, 0.5, 1827.36],
[0.4, 0.7, 1.0, 3127.89]]
# [OmegaM, h, z, DM]
DM_RESULTS =\
[[0.2, 0.6, 0, 0.0],
[0.2, 0.6, 0.5, 2285.61],
[0.2, 0.6, 1.0, 4114.6],
[0.2, 0.7, 0, 0.0],
[0.2, 0.7, 0.5, 1959.09],
[0.2, 0.7, 1.0, 3526.8],
[0.3, 0.6, 0, 0.0],
[0.3, 0.6, 0.5, 2203.4],
[0.3, 0.6, 1.0, 3854.47],
[0.3, 0.7, 0, 0.0],
[0.3, 0.7, 0.5, 1888.63],
[0.3, 0.7, 1.0, 3303.83],
[0.4, 0.6, 0, 0.0],
[0.4, 0.6, 0.5, 2131.92],
[0.4, 0.6, 1.0, 3649.21],
[0.4, 0.7, 0, 0.0],
[0.4, 0.7, 0.5, 1827.36],
[0.4, 0.7, 1.0, 3127.89]]
# [OmegaM, h, z, DA]
DA_RESULTS =\
[[0.2, 0.6, 0, 0.0],
[0.2, 0.6, 0.5, 1523.74],
[0.2, 0.6, 1.0, 2057.3],
[0.2, 0.7, 0, 0.0],
[0.2, 0.7, 0.5, 1306.06],
[0.2, 0.7, 1.0, 1763.4],
[0.3, 0.6, 0, 0.0],
[0.3, 0.6, 0.5, 1468.93],
[0.3, 0.6, 1.0, 1927.23],
[0.3, 0.7, 0, 0.0],
[0.3, 0.7, 0.5, 1259.08],
[0.3, 0.7, 1.0, 1651.91],
[0.4, 0.6, 0, 0.0],
[0.4, 0.6, 0.5, 1421.28],
[0.4, 0.6, 1.0, 1824.61],
[0.4, 0.7, 0, 0.0],
[0.4, 0.7, 0.5, 1218.24],
[0.4, 0.7, 1.0, 1563.95]]
# [OmegaM, h, z, DL]
DL_RESULTS =\
[[0.2, 0.6, 0, 0.0],
[0.2, 0.6, 0.5, 3428.41],
[0.2, 0.6, 1.0, 8229.21],
[0.2, 0.7, 0, 0.0],
[0.2, 0.7, 0.5, 2938.64],
[0.2, 0.7, 1.0, 7053.61],
[0.3, 0.6, 0, 0.0],
[0.3, 0.6, 0.5, 3305.09],
[0.3, 0.6, 1.0, 7708.93],
[0.3, 0.7, 0, 0.0],
[0.3, 0.7, 0.5, 2832.94],
[0.3, 0.7, 1.0, 6607.66],
[0.4, 0.6, 0, 0.0],
[0.4, 0.6, 0.5, 3197.88],
[0.4, 0.6, 1.0, 7298.42],
[0.4, 0.7, 0, 0.0],
[0.4, 0.7, 0.5, 2741.04],
[0.4, 0.7, 1.0, 6255.79]]
# [OmegaM, h, z, mu]
MU_RESULTS = \
[[0.2, 0.6, 0.1, 38.666337868889762],
[0.2, 0.6, 0.5, 42.675462188569789],
[0.2, 0.6, 1.0, 44.57678993423005],
[0.2, 0.7, 0.1, 38.331603920736697],
[0.2, 0.7, 0.5, 42.340728240416723],
[0.2, 0.7, 1.0, 44.242055986076984],
[0.3, 0.6, 0.1, 38.649938522544012],
[0.3, 0.6, 0.5, 42.595919369693959],
[0.3, 0.6, 1.0, 44.434971603696795],
[0.3, 0.7, 0.1, 38.31520457439094],
[0.3, 0.7, 0.5, 42.261185421540894],
[0.3, 0.7, 1.0, 44.100237655543722],
[0.4, 0.6, 0.1, 38.633904578021593],
[0.4, 0.6, 0.5, 42.524308199311712],
[0.4, 0.6, 1.0, 44.316144393006496],
[0.4, 0.7, 0.1, 38.299170629868527],
[0.4, 0.7, 0.5, 42.189574251158646],
[0.4, 0.7, 1.0, 43.981410444853431]]
def assert_equal_to_2_decimals(x, y):
assert_allclose(x, y, atol=0.01)
def test_DC():
"""Test Comoving Distance"""
for (OmegaM, h, z, DC) in DC_RESULTS:
cosmo = Cosmology(OmegaM=OmegaM, h=h)
yield assert_equal_to_2_decimals, DC, cosmo.DC(z)
def test_DM():
"""Test Transverse Comoving Distance"""
for (OmegaM, h, z, DM) in DM_RESULTS:
cosmo = Cosmology(OmegaM=OmegaM, h=h)
yield assert_equal_to_2_decimals, DM, cosmo.DM(z)
def test_DA():
"""Test Angular Diameter Distance"""
for (OmegaM, h, z, DA) in DA_RESULTS:
cosmo = Cosmology(OmegaM=OmegaM, h=h)
yield assert_equal_to_2_decimals, DA, cosmo.DA(z)
def test_DL():
"""Test Luminosity Distance"""
for (OmegaM, h, z, DL) in DL_RESULTS:
cosmo = Cosmology(OmegaM=OmegaM, h=h)
yield assert_equal_to_2_decimals, DL, cosmo.DL(z)
def test_mu():
"""Test Distance Modulus"""
for (OmegaM, h, z, MU) in MU_RESULTS:
cosmo = Cosmology(OmegaM=OmegaM, h=h)
yield assert_equal_to_2_decimals, MU, cosmo.mu(z)